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Abstract 

We consider the problem of identifying underlying community-like structures in graphs. To¬ 
wards this end we study the Stochastic Block Model (SBM) on /c-clusters: a random model 
on n = km vertices, partitioned in k equal sized clusters, with edges sampled independently 
across clusters with probability q and within clusters with probability p, p > q. The goal is to 
recover the initial “hidden” partition of [n]. We study semidehnite programming (SDP) based 
algorithms in this context. In the regime p = ° and q = ^ we show that a certain 

natural SDP based algorithm solves the problem of exact recovery in the /c-community SBM, 
with high probability, whenever ^/a — ^/|3 > vT, as long as fc = o(log n). This threshold is 
known to be the information theoretically optimal. We also study the case when k = 6(\og{n)). 
In this case however we achieve recovery guarantees that no longer match the optimal condition 
y/a — vT? > ■\/T, thus leaving achieving optimality for this range an open question. 

Keywords: graph partitioning, random models, stochastic block model, semidehnite program¬ 
ming, dual certihcate 


*namana@cs.princeton.edu, Computer Science, Princeton University 

^bandeira@mit.edu, Department of Mathematics, Massachusetts Institute of Technology (most of the work pre¬ 
sented in this paper was conducted while this author was at Princeton University). ASB acknowledges support from 
AFOSR Grant No. FA9550-12-1-0317 

*koiliar2@illinois.edu, Computer Science, University of Illinois, Urbana - Champaign 
^akolla@illinois.edu. Computer Science, University of Illinois, Urbana - Champaign 



1 Introduction 


Identifying underlying structure in graphs is a primitive question for scientists: can existing 
communities be located in a large graph? Is it possible to partition the vertices of a graph 
into strongly connected clusters? Several of these questions have been shown to be hard to an¬ 
swer, even approximately, so instead of looking for worst-case guarantees attention has shifted 
towards average-case analyses. In order to study such questions, the usual approach is to con¬ 
sider a random [McSOl] or a semi-random [FKOl, MMV14] generative model of graphs, and use 
it as a benchmark to test existing algorithms or to develop new ones. With respect to iden¬ 
tifying underlying community structure, the Stochastic Block Model (SBM) (or planted parti¬ 
tion model) has, in recent times, been one of the most popular choices. Its growing popular¬ 
ity is largely due to the fact that its structure is simple to describe, but at the same time it 
has interesting and involved phase transition properties which have only recently been discovered 
([DKMZll, MNS12, MNSI3, ABH14, CX14, MNSI4b, HWX14, HWX15, AS15, Banl5]). 

In this paper we consider the SBM on /c-communities defined as follows. Let n be a multiple of 
m, V = [n] be the set of vertices and P = {Pi} be a partition of them into k equal sized clusters 
each of size m = ^. Construct a random graph G on V by adding an edge for any two vertices 
in the same cluster independently with probability p and any two vertices across distinct clusters 
independently with probability q where p > q. We will write G ~ Gp,q,k to denote that a graph G 
is generated from the above model. Given such a G the goal is to recover (with high probability) 
the initial hidden partition P. 

The SBM can be seen as an extension of the Erdos-Renyi random graph model [ER59] with the 
additional property of possessing a non-trivial underlying community structure (something which 
the Erdos-Renyi model lacks). This richer structure not only makes this model interesting to study 
theoretically, but also renders it closer to real world inputs, which tend to have a community 
structure. It is also worth noting that, as pointed out in [CX14], a slight generalization of the SBM 
encompasses several classical planted random graph problems including planted clique [AKS98], 
[McSOI], planted coloring [AK97], planted dense subgraph [AVI3] and planted partition [Bop87, 
CKOl, FKOl]. 

There are two natural problems that arise in context of the SBM: exact recovery, where the 
aim is to recover the hidden partition completely; and detection, where the aim is to recover the 
partition better than what a random guess would achieve. In this paper we focus on exact recovery. 
Note that exact recovery necessarily requires the hidden clusters to be connected (since otherwise 
there would be no way to match the partitions in one component to another component) and it is 
easy to see that the threshold for connectivity occurs when p = Vt (log(m)/m). Therefore the right 
scale for the threshold behavior of the parameters p, g is 0 (log(m)/m), which is what we consider 
in this paper. 

In the case of two communities {k = 2) Abbe et al. [ABH14] recently established a sharp 
phase transition phenomenon from information-theoretic impossibility to computational feasibility 
of exact recovery. However, the existence of such a phenomenon in the case of fc > 2 was left open 
until solved, for k = 0(1), in independent parallel research [AS15, HWX15]. In this paper we 
resolve the above showing the existence of a sharp phase transition for k = o(log(n)). 

More precisely, in this work, we study a Semidefinite Programming (SDP) based algorithm 
that, for k = o(log(n)), recovers, for an optimal range of parameters, exactly the planted k- 
partition of O ~ Sp,q,k with high probability. The range of the parameters p, q is optimal in the 
following sense: it can be shown that this parameter range exhibits a sharp phase transition from 
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information-theoretic impossibility to computational feasibility through the SDP algorithm studied 
in this paper. An interesting aspect of our result is that, for k = o(log(n)), the threshold is the 
same as for k = 2. This means that, even if an oracle reveals all of the cluster memberships 
except for two, the problem has essentially the same difficulty. We also consider the case when 
k = 0(log(n)). Unfortunately, in this regime we can no longer guarantee exact recovery up to the 
proposed information theoretic threshold. Similar behavior was observed and reported by Chen et 
al. [CX14] and in our work we observe that the divergence between our information theoretic lower 
bound and our computational upper bound sets in at A; = 0(log(n)). This is formally summarized 
in the following theorems. 

Theorem 1.1. Given a graph G ~ with k = 0(log(m)) hidden clusters each of size m and 

p = ^ and q = ^ ; where a > /3 > 0 are fixed constants, the semidefinite program (4), 

with probability 1 — recovers the clusters when: 

• for k = o(logn), as long as 

^fa - i/d > 1; 


• for k = {'^ + o(l)) log(n) for a fixed 7 , as long as 


y/a - a /5 > 


^1 + c/Zy + log 


where c is a universal constant. 


We complement the above theorem by showing the following lower bound which is a straight¬ 
forward extension of the lower bound for k = 2 from [ABH14]. 

Theorem 1.2. Given a graph G ~ Gp,q,k with k hidden clusters each of size m where k is o{m~^) 
for any fixed A > 0, if p = and q = ^ , where a > fi > 0 are fixed constants, then it is 

information theoretically impossible to recover the clusters exactly with high probability if 

/a - y/fi < 1 . 


Note that Theorem 1.2 establishes a sharp phase transition between computational feasibility 
and information theoretic impossibility when k = o(log(n)). At fc ~ log(n) we see that our lower 
and upper bounds diverge. We leave as an open problem to determine whether such divergence is 
necessary or a shortcoming of the SDP approach. 

At the heart of our argument is the following theorem which establishes a sufficient condition 
for exact recovery with high probability. 

Theorem 1.3. Let G ~ Gp,q,k! with probability 1 — over the choice of G, if the following 

condition is satisfied, the semidefinite program (4) recovers the hidden partition: 


minA(i) > c 
i 


^/pn/k + qn + log(n) -b /log(n) -h log(fe)^ , 


( 1 ) 


where c is a universal constant and A(i) is defined as the difference between the number of neighbors 
a vertex i has in its own cluster and the maximum number of neighbors it has in any other cluster 
(with respect to the hidden partition). In other words, with probability 1 — (1) implies exact 

recovery. 
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We are able to give sharp guarantees for the semidefinite programming algorithm based essen¬ 
tially on the behavior of inner and outer degrees of the vertices. This is achieved by constructing a 
candidate dual certificate and using bounds on the spectral norm of random matrices to show that 
the constructed candidate is indeed a valid one. The problem is then reduced to the easier task 
of understanding the typical values of such degrees. Remarkably, the conditions required for these 
quantities are very similar to the ones required for the problem to be information-theoretically 
solvable (which essentially correspond to each node having larger in-degree than out-degree). This 
helps explain the optimality of our algorithm. The approach of reducing the validity of a dual 
certificate to conditions on an interpretable quantity appeared in [Banl5] for a considerably sim¬ 
pler class of problems where the dual certificate construction is straightforward (which includes 
the stochastic block model for k = 2 but not k > 2). In contrast, in the current setting, the dual 
certificate construction is complex, rendering a different, and considerably more involved analysis. 
Moreover, the estimates we need (both of spectral norms and of inner and outer degrees) do not 
fall under the class of the ones studied in [Banl5]. 

We also show that our algorithm recovers the planted partitions exactly also in the presence of 
a monotone adversary, a semi-random model defined in [FKOl]. 

1.1 Related Previous and Parallel Work 

Graph partitioning problem has been studied over the years with various different objectives and 
guarantees. There has been significant recent literature concentration around the bipartiton (bi¬ 
section) and the general fc-partition problems (multisection) in random and semi-random models 
([DKMZll], [MNS12], [MNS13], [YP14], [MNS14a], [Masl4], [ABH14], [CX14], [MNS14b], [Vul4], 
[CRV15]). 

Some of the first results on partitioning random graphs were due to Bui et al. [BCLS84] who 
presented algorithms for finding bipartitions in dense graphs. Boppana [Bop87] showed a spectral 
algorithm that for a large range of parameters recovers a planted bipartition in a graph. Feige and 
Kilian [FKOl] present an SDP based algorithm to solve the problem of planted bipartition (along 
with the problems of finding Independent Sets and Graph Goloring). Independently, McSherry 
[McSOl] gave a spectral algorithm that solved the problems of Multisection, Clique and Graph 
Coloring. 

More recently, a spate of results have established very interesting phase transition phenomena for 
SBMs, both for the case of detection and exact recovery. For the case of detection, where the aim is 
to recover partitions better than a random guess asymptotically, recent works of [MNS12, MNS13, 
Masl4] established a striking sharp phase transition from information theoretic impossibility to 
computational feasibility for the case of A: = 2. For the case of exact recovery Abbe et al. [ABH14], 
and independently [MNS14b], established the existence of a similar phase transition phenomenon 
albeit at a different parameter range. More recently the same phenomenon was shown to exist for a 
semidefinite programming relaxation, for A: = 2 in [HWX14, Banl5]. However, the works described 
above established phase transition for k = 2 and the case for larger k was left open. Our paper 
bridges the gap for larger k upto o(log(n)) for the case of exact recovery. To put our work into 
context, the corresponding case of establishing such behavior for the problem of detection remains 
open. In fact, it is conjectured in [DKMZll, MNS12] that, for the detection problem, there exists 
a gap between the thresholds for computational feasibility and information theoretic impossibility 
for any k number of communities greater than 4. In this paper, we show that that is not case for 
the exact recovery problem. 
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Chen et al. [CX14] also study the /c-community SBM and provide convex programming based 
algorithms and information theoretic lower bounds for exact recovery. Their results are similar to 
ours in the sense that they also conjecture a separation between information theoretic impossibility 
and computation feasibility as k grows. In comparison we focus strongly on the case of slightly 
superconstant k (o(log(n))) and mildly growing k (n(log(n))) and show exact recovery to the 
optimal (even up to constants) threshold in the former case. Very recently in independent and 
parallel work, Abbe and Sandon [AS 15] studied the problem of exact recovery for a fixed number of 
{k > 2) communities where the symmetry constraint (equality of cluster sizes and the probabilities 
of connection are same in different clusters) is removed. Our result, in contrast to theirs, is based 
on the integrality of a semidefinite relaxation, which has the added benefit of producing an explicit 
certificate for optimality (i.e. indeed when the solution is “integral” we know for sure that it is the 
optimal balanced /c-partition). Abbe and Sandon [AS15] comment in their paper that their results 
can be extended for slightly superconstant k but leave it as future work. In another parallel and 
independent work, Hajek et al. [HWX15] study semidefinite programming relaxations for exact 
recovery in SBMs and achieve similar results as ours. We remark that semidefinite program in 
consideration in [HWX15] is the same as the semidefinite program (4) considered by us (up to an 
additive/multiplicative shift) and both works achieve the same optimality guarantee for k = 0(1). 
They also consider the problem of SBM with 2 unequal sized clusters and the Binary Censored 
Block Model. In contrast we show that the guarantees extend to the case even k is superconstant 
o(log(n)) and provide sufficient guarantees for the case of A: = 0(log(n)) pointing to a possible 
divergence between information theoretic possiblity and computational feasibility at fe = log(n) 
which we leave as an open question. 

1.2 Preliminaries 

In this section we describe the notation and definitions which we use through the rest of the paper. 

Notation. Throughout the rest of the paper we will be reserving capital letters such as X for 
matrices and with X[i,j] we will denote the corresponding entries. In particular, J will be used to 
denote the all ones matrix and I the identity matrix. Let A* B he the element wise inner product 
of two matrices, i.e. A* B = Trace{A'^ B). We note that the all the logarithms used in this paper 
are natural logarithms i.e. with the base e. 

Let G = (V, E) be a graph, n the number of vertices and A{G) its adjacency matrix. With 
G ~ Gp,g,k we denote a graph drawn from the stochastic block model distribution as described 
earlier with k denoting the number of hidden clusters each of size m. We denote the underlying 
hidden partition with {Pt}- Let P{i) be the function that maps vertex i to the cluster containing 
i. To avoid confusion in the notation note that with Pt we denote the cluster and P{i) denotes 
the cluster containing the vertex i. We now describe the definitions of a few quantities which will 
be useful in further discussion of our results as well as their proofs. Define 5i^p^ to be the “degree” 
of vertex i to cluster t. Formally 

4 A{G)[pj] 

j&Pt 

Similarly for any two clusters Pt^^Pt^ define 6p^^^p^^ as 

iePti iePta 
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Define the “in degree” of a vertex i, denoted to be the number of edges of going from 

the vertex to its own cluster 

also define to be the maximum “out degree” of a vertex i to any other cluster 

S mL (i) — max di^p^ . 

maxV y Pt^P{i) 

Finally, define 

A(i) will be the crucial parameter in our threshold. Remember that A(i) for A{G) is a random 
variable and let A = E[A(i)] be its expectation (same for all i). 

Paper Organization. The rest of this paper is structured as follows. In Section 2 we discuss the 
two SDP relaxations we consider in the paper. We state sufficient conditions for exact recovery for 
both of them as Theorem 2.1 and Theorem 2.2 (the latter is a restatement of Theorem 1.3) and 
provide an intuitive explanation of why the condition (1) is sufficient for recovery upto the optimal 
threshold. We provide formal proofs of Theorems 1.1 and 1.2 in the Appendix in Sections A.4 and 
A. 3 respectively. We provide the proof of Theorem 2.2 in Section 3. Further in Section 4 we show 
how our result can be extended to a semi random model with a monotone adversary. Lastly in the 
Appendix we collect the proofs of all the lemmas and theorems left unproven in the main sections. 

2 SDP relaxations and main results 

In this section we present two candidate SDPs which we use to recover the hidden partition. The 
first SDP is inspired from the Max-k-Cut SDP introduced by Frieze and Jerrum [FJ95] where 
we do not explicitly encode the fact that each cluster contains equal number of vertices. In the 
second SDP we encode the fact that each cluster has exactly m vertices explicitly. We state our 
main theorems which provide sufficient conditions for exact recovery in both SDPs. Indeed the 
latter SDP, being stronger, is the one we use to prove our main theorem, Theorem 1.1. Before 
describing the SDPs lets first consider the Maximum Likelihood Estimator (MLE) of the hidden 
partition. It is easy to see that the MLE corresponds to the following problem which we refer to 
as the Multisection problem. Given a graph G = {V, E) divide the set of vertices into k clusters 
{Pt} such that for all ti,t 2 , \PtA = iPtjl and the number of edges {u,v) G E such that u G Pt^ 
and V G Pt 2 are minimized. (This problem has been studied under the name of Min-Balanced-k- 
partition [KNS09]). In this section we consider two SDP relaxations for the Multisection problem. 
Since SDPs can be solved in polynomial time, the relaxations provide polynomial time algorithms 
to recover the hidden partitions. 

A natural relaxation to consider for the problem of multisection in the Stochastic Block Model 
is the Min-k-cut SDP relaxation studied by Erieze and Jerrum [EJ95] (They actually study the 
Max-k-Cut problem but we can analogously study the min cut version too). The Min-k-cnt SDP 
formulates the problem as an instance of Min-k-cnt where one tries to separate the graph into k 
partitions with the objective of minimizing the number of edges cut by the partition. Note that the 
k-Cut version does not have any explicit constraints for ensuring balancedness. However studying 
Min-k-Cut through SDPs has a natural difficulty, the relaxation must explicitly contain a constraint 
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that tells it to divide the graph into at least k clusters. In the case of SBMs with the parameters 
a and /3 one can try and overcome the above difficulty by making use of the fact that 
the generated graph is very sparse. Thus, instead of looking directly at the min-k-cut objective 
we can consider the following objective: minimizing the difference between the number of edges 
cut and the number of non-edges cut. Indeed for sparse graphs the second term in the difference 
is the dominant term and hence the SDP has an incentive to produce more clusters. Note that 
the above objective can also be thought of as doing Min-k-Cut on the signed adjacency matrix 
2A[G) — J (where J is the all ones matrix). Following the above intuition we consider the following 
SDP (2) which is inspired from the Max-k-Cut formulation of Feige and Jerrum [FJ95]. In the 
Appendix Section A.2 we provide a reduction, to the k-Cut SDP we study in this paper, from 
a more general class of SDPs studied by Charikar et al. [CMM06] for Unique Games, and more 
recently by Bandeira et al. [BCS15] in a more general setting. 


max {2A{G) — J) •Y 
s.t. Yu = 1 (V i) 

y ^ 0 . 

To see that the above SDP is a relaxation of the multisection problem note that for the hidden 
partition {Pt\ we can dehne a candidate solution Y* as follows. Ij* = 1 if i,j belong to the same 
cluster and if belong to different clusters. Note that although the objective does not 

directly minimize the number of edges cut, it is an additive/multiplicative shift of it. For the above 
SDP we prove the following theorem in the Appendix in Section A.7.1. Given G ~ Gp,q,k, dehne 

v{i) = 6in{i) — max 

Theorem 2.1. Let G ~ Gp,q,k, withp = aand q = (5where a, (3 are constant. Consider 
the SDP given hy (2). With probability 1 — n~^G) over the choice of G, if the following condition 
is satisfied then the SDP recovers the hidden partition 




^P{3WP(i) 

n/k 


minz^(i) > c 

i 


{^\/pn/k + qn+ v^log(n)^ , 


( 3 ) 


where c is a universal constant. 

In other words with probability 1 — n~^G) ^ condition (3) implies exact recovery. 

The proof of the above Theorem is included in the Appendix in Section A.7.1. We note the 
above condition is not an optimal one in terms of exact recovery and we discuss this issue next. 
It is quite possible that the above SDP recovers the planted multisection all the way down to the 
threshold however we have not been able to establish this and leave it as an open question. Indeed 
to prove our results we consider a stronger SDP with which we establish optimality. We have 
empirically tested the performance of both the SDPs and include the results in the Appendix in 
Section A.l. We now take a closer look at the above sufficient condition (3) and argue why the 
condition is not strong enough to achieve optimal results. It is not hard to see that 


m^)] 


n n 


fl 

q- log(n) 
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Note that, in expectation, the maximization term in the definition of iy{i) has an extra log(n) 
term as the maximization runs through all i,j pairs. For the condition (3) to hold with at least a 
constant probability, we expect that it needs to be the case that 


Substituting the parameter range that we are interested p = and q = ^ we require 

that 

“ “ ° 

Indeed from the above expression it is clear that if fe << log(n) the first term above dominates and 
we cannot expect to get the tight results we hope for in Theorem 1.1. A closer look at the above 
calculation reveals that the major barrier towards achieving the optimal result is the additional 
log(n) factor due to the maximization over all i,j in the definition of z^(i). For instance if one 
could replace the maximization term above with a term that takes the maximum per vertex over 
all clusters one would pick up only a log(A:) term (as there are only k clusters) and hopefully achieve 
optimality. 

In context of the above discussion we suggest the following SDP in which we explicitly add a 
per-row constraint bounding the number of vertices belonging to the same cluster as the vertex in 
contention. 


max A{G) • Y 

s.t. ^ Yij + ^ Yji = 2n/k (V i) 

i i (4) 

Yu = 1 (V i) ^ > 

Yij>0 (Vz,j) 

y 0 . 

To see that the above SDP is a relaxation of the MLE discussed above note that for any partition 
P = {Pi}, we can associate a canonical n x n matrix Yp with it defined as 


Yp[i,j] 


1 vertex i and j belong to the same cluster 
0 otherwise 


Note that Yp satisfies the SDP constraints and the SDP maximizes the number of edges within 
the cluster which is equivalent to minimizing the number of edges across the clusters. The second 
constraint above, since Y is symmetric, says that the sum of the values along the row is n/k, which 
represents the number of vertices in a cluster. For the SDP above we show the following theorem 
which is a restatement of Theorem 1.3 

Theorem 2.2. Let G ~ Qp,q,k- With probability 1 — over the choice of G, if the following 

condition is satisfied then the SDP defined by (4) recovers the hidden partition 


minA(i) > c 
i 


^/pn/k + qn + log(n) + v^log(n) + log(A;)^ , 


In other words with probability 1 — n 


condition (5) implies exact recovery. 


( 5 ) 
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We remark that the above statement is indeed true for all values of p, g. For the specific range 
that we are interested in we show in Section A.4 how condition (5) leads to the optimal threshold. 
In the next section we provide an intuitive explanation of why this is so. 

2.1 Optimality of Theorem 2.2 

In this section we give an intuitive high level explanation for the optimality of the condition in (5) 
for k « log(n) in Theorem 2.2. We prove it formally in the appendix. As stated earlier the regime 
we consider is the case when n = and q = ^ where a and B are constants. 

Note that for the MLE to succeed the values of p and q should be such that minj{(5®"'(i) — 
> 0 w.h.p., since otherwise one expects there to be many such vertices i for which — 
^i^Pt < 0 for some t / P{i) and in particular a pair ti,t 2 such that there exists i G Pti,j £ Pt 2 
such that (5®"'(z) — 6i^p^ < 0 as well as < 0. This would imply that we can exchange 

the pairs i, j and get a better partition than the planted partition and therefore that the MLE itself 
does not recover the hidden partition. 

Recall that A{i) = J*” — We now show that the deviation in A(z) required by Theorem 

2.2 is o(E[A(z)]) and therefore informally one can expect, intuitively, that 

P(minA(i) > 0) ~ P ^minA(i) > o(E[A(z)])^ 

which implies that the SDP in Theorem 2.2 recovers the partition optimally. Indeed, the deviation 
required in Theorem 2.2 is o(E[A(i)]), 

{\/pn/k + qn + q^/n/k\og{n) + Y^log(n)^ O (^^ylog{m){a + kB)^ + 0{^/\og{n)) 

E[A(i)] n ((a -/3) log(m)) 

= o(l) • 

Above we assumed that k = o(log(n)). Poliowing from the intuition above we prove Theorems 
1.1 and 1.2 in the appendix which imply that our SDP is optimal. 

In the Appendix (Section A.l) we present an experimental evaluation of the two SDPs considered 
in this section. The experiments corroborate Theorem 1.1 and also show that the SDP in (2) 
experimentally seems to have a similar recovery performance as the (stronger) SDP in (4) however 
we could only prove a suboptimal result about it. We leave the possible optimality of the SDP in 
( 2 ) as an open question. 

3 Proof of the main theorem 

In this section we prove our main theorem, Theorem 2.2 about the SDP defined by (4). We restate 
the SDP here. 


A{G) • Y 

^Yij + Y,yp = ‘^n/k (Vi) 

ii = l (v'i) 

Yij>0 (Vi,j) 

y ^ 0 . 
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s.t. 


( 6 ) 











Let Y* be the matrix corresponding to the hidden partition P* = {Pt}, i.e. Y*[i,j] = 1 if i,j 
belong to the same cluster and 0 otherwise. Let OPT{G) be the optimal value in the above SDP. 
We will show that Y* is the unique solution to SDP (4) w.h.p as long as the conditions in Theorem 
2.2 are satisfied. This would prove Theorem 2.2. Our proof will be based on a dual certihcate. In 
that context consider the dual formulation of the above SDP which is the following 

min Trace(D) + {2n/k) Xj 

* (7) 

s.t. D + Xi{Ri + Ci) — Z — A)p Q . 

i 

where D is a diagonal matrix, Xi are scalars, Z is a non-negative symmetric matrix (corresponding 
to the > 0 constraints) with 0 in the diagonal entries, Ri is the matrix with 1 in every entry of row 
i and 0 otherwise, Ci = Rf is the matrix with 1 in every entry of column i and 0 otherwise and we 
write A instead of A{G) when there is no fear of confusion. 

Let DUAL{G) be the optimal value of the above dual program. We will first exhibit a valid 
dual solution M* = (D*,{x*},Z*) which, with high probability, has dual objective value 5 such 
that A»Y* = 5. But since A»Y* < OPT{G) < DU AL{G) (by weak duality) we get that Y* is 
an optimal solution to the above SDP. We will also show uniqueness via complementary slackness. 

Before moving on further it will be convenient to introduce the following definition which we 
will be used in the proof later. We also encourage the reader to revisit the Notations section 
(Section 1.2) at this time as it would help with the reading of what follows. 

Definition 3.1. Given a partition of n vertices {Pt}^^i we define the vectors {vt} to be the indicator 
vectors of the clusters. We further define the following subspaces, which are perpendicular to each 
other, and partition IR”. 

• Rk- the subspace spanned by the vectors {vt}, i.e. the subspace of vectors with equal values 
in each cluster, 

• IR,i|fc: the subspace perpendicular to Rfc, he. the subspace where the sum on each cluster is 
equal to 0. 

At this point it is useful to look at what the complementary slackness condition implies. Since 
strong duality holds in the case of our SDP (easy to check that Slater’s conditions are satisfied) we 
have that complementary slackness is zero which implies that 

Trace(M*W) = Trace (m* ^ vtvj'^ = 0 . 

for any optimal dual solution M* . The above condition implies that for any such M* (since M* is 
PSD) it must be that the subspace Rk is an eigenspace with eigenvalue 0 which implies 

{'ii,t)6i^P,{M*) = 0 . (8) 

Having established the conditions that must be satisfied by the optimal dual solution M* , we 
describe our candidate dual solution 

{D*,{xn,z*). 
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We begin by describing the choice of Z*. If vertex i and j belong to the same cluster then 
= 0 otherwise 

u = ( , ( ^mLU) _ , ( ^P{j)^P{i) _ . Jph^p^\ 

V P/k n/k ) \ njk njk ) ^ \{ri j k){n j k) {nlk){nlk)) ' 

It is easy to see that the matrix Z* is symmetric by noting that exchanging j and i in the above 
expression leads to the same value. Also to see that each entry of Z* is non-negative note that 
Z*[i,j] is the sum of non-negative terms. Having defined Z* as above we choose x* to be such that 
the condition given in Equation 8 holds for the non-diagonal blocks, yielding: 


* n/k 


1 . ^Pt.^Ptn 

— min- - -=— . 

2 tr,t 2 {n/k){n/k) 


And finally we define D* to balance out the sum along the diagonal blocks from A as well as 


the x|. 


D*[ki] 




jePii) 


cix(j) 

n/k 


+ min 
tl,t2 


^PH^Pt2 

n/k 


Interestingly, this dual certificate construction seems to share some features with the one pro¬ 
posed by Awasthi et al. [ABC'’'15] for an SDP relaxation for k-means clustering. While we were 
not able to make a formal connection, it would be very interesting if the reason for the similarities 
was the existence of some type of canonical way of building certificates for clustering problems, we 
leave this for future investigations. 

Now consider the objective for the dual program (7). It is easy to see that it is equal to 


Trace(E>*) + 2n/k X] < = ' 

i i 

The following lemma the proof of which we provide in the Appendix in Section A .6 implies that 
the above mentioned solution is a valid dual solution, proving that Y* is an optimal solution to the 
above program (by weak duality). 

Lemma 3.2. The matrix M* = D* + Y/jiX*{Ri +Ci) — A — Z* (as defined above) is such that with 
probability 1 — if the condition (5) is satisfied, then 


M* hO. 


It is easy to show using complementary slackness that Y* is indeed the unique optimal solution 
with high probability. For completeness we provide the proof in the Appendix in Section A. 6 .4 


4 Note about the Monotone Adversary 

In this section, we extend our result to the following semi random model considered in the paper of 
Feige and Kilian [FKOl]. We first define a monotone adversary (we define it for the “homophilic” 
case). Given a graph G and a partition P = {Pi} a monotone adversary is allowed to take any of 
the following two actions on the graph: 

• Arbitrarily remove edges across clusters, i.e. {u,v) s.t. P{u) T’(u). 
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• Arbitrarily add edges within clusters, i.e. {u,v) s.t. P{u) = P{v). 

Given a graph G let Gadv be the resulting graph after the adversary’s actions. The adversary is 
monotone in the sense that the set of the optimal multisections in Gadv contains the set of the 
optimal multisections in G. Let B{G) be the number of edges cut in the optimal multisection. We 
now consider the following semi-random model, where we first randomly pick a graph G ~ Gp,q,k 
and then the algorithm is given Gadv where the monotone adversary has acted on G. The following 
theorem shows that our algorithm is robust against such a monotone adversary; 

Theorem 4.1. Given a graph Gadv generated by a semi-random model described above we have that 
with probability 1 —o(l) the algorithm described in section 3 recovers the original (hidden) partition. 
The probability is over the randomness in the production of G ^ Qp,q,k on which the adverary acts. 

We provide the proof of the above theorem in the Appendix in Section A.8 
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A Appendix 

A.l Experimental Evaluation 

In this section we present some experimental results on the SDPs presented above. For both of the 
SDPs we consider the case oi p = a and q = with k = 3 and m = 20. We vary a 

and /? and for each pair of values we take 10 independent instances and the shade of grey in the 
square represents the fraction of instances for which the SDP was integral with lighter representing 
higher fractions of integrality. The red lines represent the curve we prove in our main theorem 1.1 
i.e. ,/a — > 1. 

Figure 1 corroborates our theorem 1.1 as for SDP in (4) we observe that experimentally the 
performance almost exactly mimics what we prove. For the other (possibly) weaker SDP in (2) we 
see in Figure 2 that the performance is almost similar to the stronger SDP however we were unable 
to prove it formally as discussed Section 2. We leave this as an open question to show that SDP 
in 2 is integral all the way down to the information theoretic threshold (i.e. /a — /fl >1). We 
observe from the experiments above that this indeed seems to be the case. 
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Empirical Probability of exact recovery over 10 random trials on each pair of values. 



1 1,1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1,9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 

alpha 


Figure 1: Performance of SDP in (4). We consider the case of p = a and q = (3 with 
k = 3 and m = 20. We vary a and (3 and for each pair of values we take 10 independent instances 
and the shade of grey in the square represents the fraction of instances for which the SDP was 
integral with lighter representing higher fractions of integrality. The red line represents the curve 
we prove in our main theorem 1.1 i.e. ^/a — ^/]3 > 1. 

A.2 The multireference alignment SDP for clustering 

In this section we describe an interesting connection between the SDPs used for clustering and 
partitioning problems and others such as ones used for the multireference signal alignment and the 
unique games problems. 

For illustrative purposes we will consider a slightly different version of the balanced k-cut 
(multisection) problem described earlier. Instead of imposing that the graph is partitioned in equal 
sized clusters, we will consider the objective value to be maximized to be the difference between the 
number of agreeing pairs and disagreeing pairs where an agreeing pair is a pair of nodes connected 
by an edge that was picked to be in the same cluster or a pair of points not connected by an edge 
that is not in the same cluster, and disagreeing pairs are all the others. Note that, if the balanced 
partition constraint was enforced, this objective would be equivalent to the multisection one. 

The multireference alignment problem in signal processing [BCSZ14] consists of aligning n 
signals yi,... ,yn with length k that are copies of a single signal but have been shifted and corrupted 
with white gaussian noise. For a G [k], we set i?;. to be the k x k matrix that shifts the entries of 
vector by a coordinates. In this notation, the maximum likelihood estimator for the multireference 
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Empirical Probability of exact recovery over 10 random trials on each pair of values. 



alpha 


Figure 2: Performance of SDP in (2). We consider the case oi p = a and q = (3 with 
k = 3 and m = 20. We vary a and (3 and for each pair of values we take 10 independent instances 
and the shade of grey in the square represents the fraction of instances for which the SDP was 
integral with lighter representing higher fractions of integrality. The red line represents the curve 
we prove in Theorem 1.1, for the SDP (4), i.e. -y/a — y/]3 > 1. 


alignment problem is given by the shifts li,... ,ln ^ [k] that maximize 


E 

*d=i 




«;»)=E ^ 


*,i=i 


yjyfRi^Rj 


(9) 


A fruitful way of thinking about (9) is as a sum, over each pair i,j, of pairwise costs that 
depends on the choices of shifts for the variable in each pair. An example of a problem of this type 
is the celebrated Unique Games problem, and indeed the SDP approach developed in [BCSZ14] for 
the multireference alignment problem is an adaptation of an SDP based approximation algorithm 
for the Unique Games problems by Gharikar et al. [GMM06]. The objective in the alignment 
problem (9) has, however, an important property — the pairwise costs only depends on the relative 
choices of shifts. More precisely, both li and Ij being increased by the same amount has no effect 
on the pairwise cost relative to (i,j). In fact, there is a general framework for solving problems 
with this group invariance-type property, called non-unique games, when the group involved is 
compact [BCS15]. The example above and SDP (2) that we will derive below are particular cases 
of this framework, but it is more enlightening to derive the SDP we will use for partitioning from 
the multireference alignment one. 

To obtain an SDP for the partitioning problem, one can think of each node i as a signal y* in 
IR^ and think of a shift label as a cluster membership, the cost associated to the pair i,j should 
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then: if the nodes are connected, +1 if the two signals are given the same shift and —1 otherwise; 
if the nodes are not connected it should be —1 if the two signals are given the same shift and 
+1 otherwise. This can be achieved by replacing yjyf on the objective (9) by appropriate k x k 
matrices Cfj = ^ {21 — 11^) if i and j are connected and Cfj = \ (H^ — 2/) if not. Our objective 
would then be 


E E - E Tr Hr,,R} 


a=l i,jeCa 


ij&[n] 


where Ri. is constrained to be a circulant permutation matrix (a shift operator). 
The SDP relaxation proposed in [BCSZ14] would then take the form 


max Tt{CX) 

S. t. Xii — Ikxk 

Xijl = 1 
Xij is circulant 
X > 0 
X E 0, 


( 10 ) 


It is clear, however, that (10) has many optimal solutions. Given an optimal selection of cluster 
labelings, any permutation of these labels will yield a solution with the same objective. For that 
reason we can adapt the SDP to consider the average of such solutions. This is achieved by 
restricting each block Xij to be a linear combination of Rxk and 11^ (meaning that it is constant 
both on the diagonal and on the off-diagonal). Adding that constraint yields the following SDP. 


max Tr(C'X) 

S. t. Ajj — Ikxk 
Xijl = 1 
Xij is circulant 
= (V.j)„ 

X>0 
A E 0, 

Since the constraints in (11) imply 

(Ajj)ii + {k - 1) (Ajj)^2 = 


( 11 ) 


(11) can be described completely in terms of the variables {Xij)-^^. For that reason we consider the 
matrix Z G |R"-x"- with entries Zij = (Ajj)^^. We can then rewrite (11) as 


max Tr (cz'^ 

s. t. Zii — 1 
Z > 0 
zW E 0, 


( 12 ) 


where Cij = kCij and Z^^^ is the nk x nk matrix whose n x n diagonal blocks are equal to Z and 
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1 T ^ 7 T~\ 

whose n X n non-diagonal blocks are equal to . For example, 





y 

11^-X 

lY-z 1 

zi^i = 

Z 11^-z' 

11^-Z z 

and Z(3) = 

ll^-Z 

2 

2 

Z 

2 

lY-Z 

2 




11^-Z 

L 2 

ll^-Z 

2 

Z 


The following lemma gives a simpler characterization for the intriguing ^0 constraint. 
Lemma A.l. Let Z he a symmetric matrix and k >2 an integer. Z^^') ^ 0 if and only if Z ^ 
Before proving Lemma A.l we note that it implies that we can succinctly rewrite (12) as 


max 

s. t. 


Tr (CZ^ 
Zii = 1 

Z > 0 


(13) 


A simple change of variables Y = j^^Z — allows one to rewrite (13) as (for appropriate 

matrix C" and constant c'), 

max Tr {C'Y) — c' 


s. t. 


Yu = l 

Y > _f 

-^*7 - k- 

y ^ 0. 


(14) 


Remarkably, (14) coincides with the classical semidefinite relaxation for the Max-k-Cut problem [FJ95], 
which corresponds to ( 2 ) used in this paper. 


Proof, [of Lemma A.l] 

Since, in this proof, we will be using 1 to refer to the all-ones vector in two different dimensions 
we will include a subscript denoting the dimension of the all-ones vector. 

The matrix is block circulant and so it can be block-diagonalizable by a block DFT matrix, 
Fkxk ® Inxn, where Ffcxfc is the k x k (normalized) DFT matrix and ( 8 > is the Kronecker product. 
In other words, 

i^kxk ® Inxri) ^ (Tfcxfc ® Inxri) 

is block diagonal. Furthermore, note that 


Zi'^) = i Ifclfc ® ^ 1 - I T 


k-1 


kxk 


inll-Z 

k-1 


Also, It is easy to check that 
{IP'kxk ® Ifixn) ^Ikxk ® 

and 


Inll-Z 


{Ykxk ® d^nxn) ( Ifclfc 


k-1 


Inlf - Z 


{IP'kxk ® Inxri) — Ikxk ' 


Z- ^^^n-Z 


k-1 


, T .o, ^ri^n 


k-1 


(.Y'kxk ® Inxri) — ^ 


InXi - Z 


T .o, ^ri^n 


k-1 
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This means that (T^xA; ® Inxn) {Fkxk ® Inxn)^ is a block diagonal matrix with the first 
block equal to A and all other diagonal blocks equal to B where A and B are given by 

= Z - ~ ^ = Inll andB = Z- ■ 

k — 1 k — 1 k — 1 

Thus, the condition ^ 0 is equivalent to Z-^ 0 which can be rewritten as, 

□ 


A.3 Proof of Optimality - Theorem 1.2 

Proof. The theorem follows directly from the lower bound presented in [ABH14]. They showed 
that [ABH14, Theorem 1] when we sample G ~ Gp,ij ,2 with p = q = jj' ^ 

information theoretically impossible to correctly recover the clusters with high probability if 

<V2 

Now consider G Gp,q,k with p = and q = /3 • Suppose that the algorithm was 

given the membership of vertices in all the clusters except two of them. A direct application of the 
above theorem yields that it is information theoretically impossible to correctly recover the two 
unrevealed clusters with high probability if 


which is equivalent to 




Ao log(^) _ I log(^) 
log(m) log(m) 


1 + On(l) 


which proves the bound. 


□ 


A.4 Proof of Optimality - Theorem 1.1 

Proof. We will use the condition of theorem 1.3 and the following lemma, to prove theorem 1.1. 

Lemma A.2. Let p = ^ and q = ^ k = 7 log(m) (where 7 = 0(1)). Now we have 
that as long as 

yfa- ^/l3> + ^1 + log ( 1 ^) 

then for sufficiently large n we have that with probability at least 1 — \/i,t 

6 "'^{i) - 5i^p^ > C 2 (^v^log(n) + y/a log(n)^ 
where C 2 > 0 be any fixed number and ci > 0 m (15) is a constant depending on C 2 
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To complete the proof of theorem 1.1 we first observe that for the given range of parameters 
p = " 5 ~ ^ condition (5) in Theorem 1.3 becomes 

c (^pnjk + qn + q^^\og{n) + v^log(n) + log(A:) 

However, Lemma A.2 implies that with probability 1 — we have that if condition 15 is 

satisfied then Vi, t 

5*"(i) - 5i^p^ > C 2 + i/alog(n)^ 

where C 2 > 0 depends on c. Therefore with probability 1 — the condition in (5) of Theorem 

1.3 is satisfied which in turn implies the SDP in Theorem 1.3 recovers the clusters, which concludes 
the proof of Theorem 1.1. Note that setting 7 = o(l) we get the case k = o(log(n)) and the above 
condition reduces to ^/a — > 1 + Oji(l). □ 


< 


C 2 /3k\og{m) + i/a log(n)^ 


In the rest of the section we prove Lemma A.2. For the remainder of this section we borrow 
the notation from Abbe et al. [ABH14]. In [ABH14, Definition 3, Section A.l], they define the 
following quantity T{m,p,q,S) which we use: 

Definition A. 3 . Let m be a natural number, p,q € [0,1], and (5 > 0, define 


T{m,p, g, 5) = P 


-Wi)>6 


where Wi are i.i.d Bernoulli(p) and Zi are i.i.d. Bernoulli(q), independent of the IT,. 

Let Z = YllLiZi and W = The proof is similar to proof of [ABH14, Lemma 8, 

Section A.l] with modifications. 


Proof, (of Lemma A. 2) We will bound the probability of the bad event 

- 6i^p^ < C2 (^v^log(n) + ^/alog{n)j . 

Note that Sin{i) is a binomial variable with parameter p and similarly 5i^p^ is a binomial variable 
with parameter q and therefore, following the notation of [ABH14], we have that the probability of 
this bad event is 

T (^m,p,g,-C2 (^V^log(n) + v^cnog(^^^ . 

We show the following strengthening of their lemma. 

Lemma A.4. Let Wi be a sequence of i.i.d Bernoulli( ^^°^'^^ '^ random variables and Zi an inde¬ 
pendent sequence of i.i.d Bernoulli (^ ^ ) ’’"o-n-dom variables, then the following bound holds for 
m sufficiently large: 


T ( m. 


a log(m) f5 log(m) 


m 


m 

exp 


-C 2 (^\/^log(n) + v^a log(n)^ 

- (^a + P - 2 - Cl + log ^ + 0 ( 1 )^ log(m)^ 


where C 2 > 0 is a fixed number and ci > 0 depends only on C 2 . 


( 16 ) 
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Assuming the above lemma and taking a union bound over all clusters and vertices we get the 
following sequence of equations which proves Theorem A.2 


P - 5i^p, < C2 (^^/^log(n) + 

< mfc^exp ^a + /3 - 2i/^- cii/^ + log +o(l)^ log(m) 

< exp + /? - 2v^ - 1 - cia/^ + log ^ log(m) 

< 


□ 

Proof of Lemma A. 4- The proof of lemma A.4 is a simple modification of the proof of [ABH14, 
Lemma 8, Section A.l], We mention the proof here for completeness. 

Define r = C 2 (^'/^\og{n) + a log(n)^ < ci-v/^log(n) (for some fixed ci > 0 depending only 
on C 2 ) and let Z = ^ Zj and W = We split T as follows: 

T{m,p, q, —r) = P (—r < Z — VL < log^(m)) + P (Z — VL > log^(m)) . 

Lets bound the second term first. A simple application of Bernstein’s Inequality (the calculations 
are shown in [ABH14, Lemma 8, Section A.l]) shows that Therefore we have that 

V(Z-W> log^(m)) < exp ■ 

We now bound the first term P (—r < Z — W < log^(m)). Define 

r = argmaXxP(-^ — W = —x) 

Now it is easy to see that f = 0(log(m)) (for p = a and q = (5 )• Let r-max = max(r, f) 
and rmin = min(r,f). 


P (-r < Z - IT < log^(m)) < (log^(m) + rraax)P{Z - IT = -rmin) 


' log2 (m)+r 

n 


(log^(m) + rmax) ^ P{Z = k 2 - r)P{W = k 2 ) 


< 


k2=r„ 


+ Y, P{Z = k 2 -r)P{W = k 2 ) 

k2=\og^{m)+rmin 


(log^(m) + rmax)"^ max{P(Z = ^2 - rmin)P{W = k 2 )] 
< k2 

+(log^(m) + rmax)P{Z > log^(n))P(IT > log^(m)) 
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The first inequality follows easily from considering both the cases f > r or f < r. Similar probability 
estimates (using Bernstein) as before give that both 


P [Z > log^(m)) , P {W > log^(m)) < exp 



log(m) \ 
log(log(m))y 


We now need to bound maxfc 2 {P(-^ = k 2 — r)P{W = /C 2 )} for which we use Lemma A.5 which is 
a modification of [ABH14, Lemma 7, Section A.l]. Plugging the estimates from above and noting 

that maxfc 2 {P(Z = k 2 — r)P{W = ^ 2 )} = T* (^m,p, q, (defined in Lemma A. 5) we get that 


P {—r < Z — W < log^(m)) < 0{log^{n))T* (m,p,q 


Putting everything together we get that 
T{m,p,q,0) < 21og^(n)T* (m,p,q 


log(m) 


+ log"^(n) exp I —12(1) 


log(m) 
'log(log(m)) 


+log^(n) exp ( —12(1) 


log(m 

Using Lemma A.5 it follows from the above equation that 


log(m) 


log(log(m)) 


+exp ( — 12 ( 1 ) 


log(m) 

log(log(m)) 


-\og{T{m,p,q,-r)) > -f2(log(log(m))) + 5 

> + /3 — — Cl 

For the first inequality we use Lemma A.5 and set 
fact that e < ciy^dy. 


log(n) ) “ o(log(m)) 

^7 ^1 + log ^ ^ log(m) - o(log(m)) 

e = second inequality we use the 


□ 


Lemma A.5. Letp = and q = ^ and let Wi he a sequence of i.i.d Bernoulli-p random 

variables and Zi an independent sequence of i.i.d Bernoulli-q random variables. Define 

V'{m, p, q, r, e) = P Zj = r log(m)^ P Wi = {t + e) log(m)^ 

= ( ^ j (r+e) log(m) Q _ \m-(T+e) log(m) 

Vrlog(m)y V(t + e) log(m)y 

where e = 0(1). ITe also define the function 


g{a, /3, e) = (a + /3) - e log(a) - 2^(1) + a/3 + | log 

Then we have the following results for T* {m,p, q, e) = maxT->o m, p, g, r, e) .• for m £ N and 
Vr > 0 

- log(r* (m, p, q,e))> log{m)g{a, fi.e) - o (log(m)) . 



Proof. The proof of the above lemma is computational and follows from the carefully bounding the 
combinatorial coefficients. Note that 


log(U(m,p,g,r,e)) = log 


m 


T log(m) 


+ log 


m 

(r + e) log(m) 


+ Tlog(m) log(pg) + 


e log(m) log 


p 


1 — p 


+ (m - rlog(m)) log((l -p)(l - q)) 
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Substituting the values of p and q we get 


log(l/(m,p,g,r,e)) = log 


m 


+ log 


T log(m) J \{t + e) log(m) 


m 


+rlog(m) (log(a^) + 21 oglog(m) — 21 og(m)) 

( log(?77-) 

log(a) + loglog(m) — log(m) + a - 

m 

— log(m)(Q; + /3) + o(log(m)) 

We now use the following easy inequality 

n 


log ( ^ ) <k (log(ne) - log(A:)) 


and now replacing this in the above equation gives us 


-log{V {m,p,q,T,e)) > log(m) (« + /?) + (r + e) log 


r + e 


Now optimizing over r proves the lemma. 


+ T log - r log(a/3) - elog(a) 
- o(log(m)) (17) 
□ 


A.5 Proofs of Lemmas for the SDP in (4) 

A.6 Proof of lemma 3.2 

We remind the reader that the proof of the lemma below continues the use of the notation used in 
Section 3 


Proof. To prove this lemma we first show that Equation 8 is satisfied for M*. This implies that 
the vectors {ut} which are indicator vectors for the clusters are an eigenvector with eigenvalue 0 . 
Consider the value of when Pf = P{i). In this case 




D* 




IT- * 


i'eP{i) i'eP{i) 


0 . 


where the last equality follows directly from the definitions of the dual certificate. Now consider 
the value of when Pt 7 ^ P{i)- In this case 
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= + 

j&Pt j&Pt 

_ ^ * I ( ^max(0 , '^max(j ) 

k * ^ \ n/k n/k 


( ^i^P(j) 
\ njk 


+ A[hj] 


( ^j^Pji) , ^P{j)^P{i) \ _ . ^PH^Pt2 \ 

V n/k ^ {n/k){n/k)) t,M {n/k){n/k)) 


= ''lx* + X* - f <^maxO) 

k * ^ ^ ^ V n/k n/k 

jePt jePt ^ ' ' 


^Pu 


— min 


ti,t 2 [n/k){n/k) 


= 0 . 


+ 


The third equality follows by noting that the terms in the parenthesis in the expression in the 
second line go to zero in summation. The fourth equality follows directly from the definitions. 

The above implies that for all t, M*vt = 0. Therefore we only need to show that M* is PSD 
with high probability on the subspace IR^j^ (which is perpendicular to IR^ = span{{vk}))■ To that 
end, note that if a matrix W is such that for all i, W\i,ji\ = W[i,j 2 \ when P{ji) = P{j 2 ) then 
for any x G IR„|fc,lTx = 0, and similarly if for all j, W[ii,j] = lT[z 2 ,j] when P(fi) = P{i 2 ) 
then for any x G IR„|fc,x^lT = 0. Therefore we have that x^Z*x = x'^{Ri + Ci)x = 0 and so 
x^M*x = x^D*x — x'^ Ax. 

In order to finish the proof it is enough to show that for all x G IR„|fc 

x'^{D* - ^)x > 0 . 

In order to prove the above equation, and conclude the proof of Theorem 2.2 we use the following 
two lemmas, which we prove in the Appendix. 

Lemma A. 6 . Define \raax{A{G)) to be the maximum over all x G IR„|fc of x'^A{G)x. With proba¬ 
bility 1 — over the choiee of G, XmaxiA{G)) is bounded by 

-^max (A(G)) < 3^/pnJk + qn + c\/log(n) . (18) 


where c is a universal eonstant. 

Lemma A.7. With probability 1 — we have that for all elusters Pt 


n/k ~ k 
j&Pt ' 


lnlog{k) 


k 


n . 


q + log(A:) + A/ ^ log(ra) • max < q, 


Iq\og{n) log(n) 
n/k ’ n/k 


and for all pairs of clusters Pt^ and Pt 2 


min 

tl,t 2 


^PH^Pt2 

n/k 


> y - 2 v^glog(n) . 


(19) 


( 20 ) 
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Using those two lemmas, we can now conclude the proof of Theorem 2.2 as follows: 
We separate D* = Dl — D 2 , where Dl, D 2 are diagonal matrices 


Di[iA= E 


j&P{i) 


out ( 

max \J ) _ 

n/k hfy n/k 

Now for any x G IR^j^ lets consider x'^{D* — A)x 
x'^{D* — A)x > ram D\[i,i] — ( m&yiD 2 [i,i] + max x'^ Ax ) 

i \ i a;elR„|fc / 


, , - (30 y .... J 

+3y^pn/k + qn + cy^log(n)^ 

> min D\ [i, i]- c (^pn/k + qn + log(n) + \/log(n) + log(/c)^ 

> 0 . 

where c is a universal constant. The second inequality follows by direct substitutions from Equations 
18, 19, 20, the third inequlity follows from noting that n is large enough such that y/^ » y^log(n) 

and Y^log(n) ^‘^^^ << y/log{n) and < y/^. The last inequality follows from condition 

5 of Theorem 2.2. □ 

A.6.1 Proof of Lemma A.6 

We use the following recent sharp concentration result [BvH15, Corollary 3.12]. 

Theorem A.8 (Bandeira et al. [BvH15]). Let X be an n x n symmetric matrix whose entries 
Xij are independent centered random variables. Then there exists for any 0 < e < 1/2 a universal 
constant Ce such that for every t > 0 


P f|A| > {l + e)2V2a + t) < 


ne 


-i^/CeCrJ 


where 


d = max (7+ = max 11X 

i \ J ij 


W 00 • 


We apply the above theorem to the matrix A — E[A]. It is easy to see that the variance of any 
row d is upper bounded by 


d < \/p(l — p)n/k + g(l — q)n < ^/pnjk + 


qn , 


and fj* < 1. Applying theorem A.8 with the above parameters d = ^pn/k + qn and u* = 1, we 
get that with probability 1 — 

IA — E[A]| < 3yjpn/k + qn + d ^J\og{n) . 
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where d is a universal constant defined as c/ = 2ce for e = — 1 and d defined by the statement 

of Theorem A.8. Also note that E[A] + pi has the space IR^j^ as an eigenspace with eigenvalue 0 . 
Therefore we have that for any unit vector x G IR„|fc 

\x'^Ax\ < |A — E[A]| + |x'^E[A]x| 

< ‘isjpnjk + qn + d ■\/\og{n) + p 

< 2>\/pn/k + qn + c-\/log(n) . 

where c = d + This proves Lemma A.6 


A.6.2 Proof of Lemma A.7 

Proof. We prove Lemma A .7 using the following, which we prove in subsection A.6. 3 . 
Lemma A.9. For every vertex i we have that 




Using this, the proof of lemma A .7 is as follows. Note that by a direct application of the 
Chernoff bound described in Corollary A. 15 and with a union bound over all clusters and vertices 
we get that with probability 1 — - for all vertices i and all clusters Pt 7^ P{i) 


5 i^Pt ^ y y log(n) + 12 log(n) . 


Lets call the above event £ and consider the sum 

S{i) = 

Let 


Si'eP(i) '^max(*) 
njk 


qn 

, = ^+30 


n'^og{k) /rT ~ I qlog{n) log(n) 

—+ log(fc) + ^ - log(n) . max 


We have that 


P (3i S{i) > 7 ) = P(T)P {3i S{i) > 7 1 f) + P(~ £)¥ (3f S{i) > 7 1 ~ T) 

< + P {3i S{i) > 7 I ~ f) . 

Now for a fixed i we will consider P (S'(i) > 7 | ~ <?). Note that under the conditioning the in¬ 
dividual entries in the sum above are still independent, and therefore the above is an average of 
independent random variables each of which is bounded by ^ 12y^^ log(n) -|- 121og(n) (by the 
conditioning). Also note that for any positive random variable X 


E[A| ~ T] < 


P(~ £) ’ 
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and since we have that P(~ <?)>! — 1 /n, we get that 


E[ 5 (i) I ~ f] < E[S{i)] + 


ns{{)] 

n — 1 


We now nse Hoeffding’s inequality A. 16 in the conditioned probability space (and remove the 
conditioning terms from the probability for ease of notation) to get that 


/ 


P( 5 '(i) > E[ 5 '(i)] + t) < exp 
Now, if we choose 




|(f+ 12^f log(n) + 121og(„))' 


n 


t = 25^1 — log(n) • max < q, 


Iqlog{n) log(n) 
n/k ’ n/k 


and apply a union bound we get that with 

P Si > E[5i] +11 ~ £1]) < , 

and now substituting the value of E[S'(i) | £] from before and being extremely liberal with the the 
contants for n large enough we have that 


P 5 (i) > y + 30 




max 



q log(n) 

n/k 



< n 


- 0 ( 1 ) 


To show the second equation note that for any pair of clusters ti, ^2) ^Pt -^Pt is a sum of {n/k^ 
independent random variables. Therefore by a Chernoff bound from the second part of Theorem 
A. 13 and a union bound we get that with probability 1 — 


min 

tl,t2 


n/k 


> y - 2v^glog(n) . 


□ 


A.6.3 Proof of Lemma A.9 

Proof. Consider for some i, this is dehned to be the maximum of k random variables Si 

with Si ~ Bin(n//c,g) (the binomial distribution with parameters n/k,q) with variance where 

cr^ = (7(1 — q). Consider Si = Si — E[ 5 i]. Let 7 = _|_ log(A;). From Corollary A .15 we get 

that 

P > 4 (t + 1)7) < ^ , 
therefore by a union bound we get that the 

) 4 ' 


P ( maxiSj > 4(t + 1)7 
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Hence, we can bound the expectation by 


E[max 5 j] < 47 + y^ 4 (t + 1)7 P ( maxSj > 4t7 

\ i 


It follows from the above that 


t=i 

00 


1 


< 47 +J^ 4 (t + 1)7— 

t=i 

< 47 + 47 + 

< 47 + 247 

< 28 ( a\ 


Kt=l 


nlog(k) 


k 


+^og{k) 




n 


E[Cax«]<^g + 28 


n log(fc) 


k 


q + log(/c) 




□ 


A.6.4 Proof of Uniqueness of the solution 

In this section we prove that Y* is the unique optimal solution to the SDP considered in section 
3. To remind the reader M* was the candidate dual solution. For the rest of the section we use 
the same notations we defined in Sction 3. To show uniqueness we make use of complementary 
slackness which implies that for any other optimal solution Y since with high probability M* = 
Ri — A{G) — Z* is an optimal solution of the dual program we have that 

Y»M* = 0. 

But it is easy to see from the proof of Lemma 3.2 that we can make a stronger statement that the 
subspace Rk is the null space of M* and on the perpendicular subspace the lowest eigenvalue 
is strictly greater than 0. Combining this with the complementary slackness condition in particular 
implies that the span of the columns of Y are restricted to the span of Pfc. Hence, the conditions of 
the SDP (sum constraint, the diagonal value constraint and the positivity constraint) force Y = Y* 
if the column space of Y is the span of Pfc which proves uniqueness. 

A.7 Analysis for SDP in (2) 

A.7.1 Proof of Theorem 2.1 

We extend the definitions of section 1.2 to ease readability. We define the notion of relative degree 
6 by defining it as the number of edges present minus the number of edges not present. In this light 
we define the following quantities extending the definitions from Section 1.2 
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5i^P^ to be the “degree” of vertex i to cluster t. Formally 


^i^Pt — — \Pt\ 


We consider the following SDP in this section. Let J be the n x n matrix such that J[i,j] = 1 
for all i,j. 


max (2 * A{G) — J) •¥ 
s.t. Yu = 1 (Vi) 

y ^ 0 . 

The dual of the above SDP is as follows 


( 21 ) 


min Trace(D) + — Z\i,j] 

k-1^ ( 22 ) 

s.t. D-Z- {2A{G) - J) y 0 . 

where Z is a symmetric entrywise non-negative matrix with zeros in the diagonal and D is a 
diagonal matrix. 

The optimal solution Y* we have in mind is the matrix Y*j = 1 if i,j belong to the same cluster 
and if hJ belong to different clusters. Note that Y* is PSD and is a valid solution of the 

primal. In this case it is easy to see that the value of the SDP is equal to 


(2 * A{G) -J)»Y* 



Eu 




^i^Pt 


k-1 


We will exhibit a candidate dual solution D *, Z* such that 

(2 * A{G) -J).Y*= Trace(D) + ^ Z[iJ] 

ij 

and with high probabiltiy D* — Z* — (2A(G) — J) 0 if condition (3) of the theorem is satisfied. 
Note that this implies through weak duality that Y* is a solution of (2). The Uniqueness of the 
solution can be proved exactly in the same way as in Section A.6.4 

Before we define our candidate dual solution we define the following quantity for ease of notation. 


6min = min -A. 


r I \ I /I r\ / e I e '"X'l 

PU) ~ ^j^P(i) + I ~ ~ ^ ™ i^^i^PU) + ^J^P(i) ~ ~ 




^P(j)^P(i) 


(n/k) 


(23) 

We begin by describing the choice of Z*. If vertex i and j belong to the same clusters then 
Z*[i,j] = 0 otherwise 




h^Pij) Oj^P{t) 


n/k 


+ 


ip(j)^P{i) 


n/k {n/k){n/k) n/k 


= 1-2 






n/k n/k {n/k){n/k) 


^min 

n/k 
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Note that by definition (23) Z* is a symmetric non-negative matrix. We now define the diagonal 
matrix D* as 

D*[i, i] = 5in{i) + Smin = 2 “ max - 

A simple calculation now shows the first required property that 


Trace(A)) -h ^ Z[i,j] = ^ | 6in{i) 




k-1 




= (2 * A{G) -J)»Y* 


We now proceed to show that D*, Z* is a valid dual solution, i.e. 

M* = D* - Z* - {2A- 


To see this consider the following extension of the decomposition of the space IR” defined in section 
3. 

Definition A. 10. Given a A:-clustering of n vertices {Pt}t=i we define the vectors vt to be the 
indicator vectors of the clusters. We further define the following subspaces, which are perpendicular 
to each other, and partition IR*^. 

• 1 : the vectors with 1 in each coordinate 


• IRfc_i: the k — I dimensional subspace such that for every vector v G Rfc-ij v{i) = v{j) if 
P{i) = P{j) and < u, 1 >= 0 

• IR,i|fc: the subspace perpendicular to IRfc_i U1, i.e. the subspace where the sum on each cluster 
is equal to 0. 

Following are two easy observations that follow from simple calculations similar to the calcula¬ 
tions shown in Section 3. 

Observation A. 11. (V u G Rfc-i) {D* — Z* — {2A — J))v = 0 
Observation A. 12. (V u G IRn|fc) Z*v = 0 

We first focus on the subspace IR„|fc and show that Vx G IR„|fc 

x^{D* - Z* - {2A - J)x = x^{D* - 2A)x > 0 (24) 


The proof of the above statement follows from the following set of inequalities 


x^{D* — 2A)x > miuj 

D*[i,i] — 2 max x^ A{G)x 

X 


> 

2 minz/(i) — 2 maxx^x4(G)x 

i X 


> 

2 ^minp(f) — c (^\/pn/k + qn + + 

Vlog(n)) 

> 

0 



where the second inequality above follows from substituting the values of (fj_ 5 .pp) in terms of (5j_^p(t) 
in the expression for D*[i,i] and using the definition of ^{i). The second inequality follows from 
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Lemma A.6 and third inequality follows from the condition (3). Note that in condition (3) if we 
assume the constant to be c +1 instead of c then we get a stronger property that the above quantity 
is in fact greater than y^log(n) and not just positive. We use this below. 

The above analysis shows that the matrix M* = D* — Z* — {2A — J) is PSD on the subspace 
IR„|fc. Lets now focus on a vector y G IR„|fc © 1. Let H* = D* — Z* — 2A = M* — J. By appropriate 
scaling we can consider any y = x + 6^ (see footnote where x G IR„|fc is a unit vector and 5 > 0. 

In the analysis above we explained that x'^H*x > ^log(n)||x|p = ^log(n). With these facts in 
place consider y"^M*y 

y'^M*y = x^H*x + —l^Jl + 2x'^H*-^l 

n y/n 

> -v/log(n) + 6‘^n — 26\\H* || 

where we use the fact that for unit vector x ^ II- Therefore as long as we have 

that 4||Lf*|p < 4nY^log(n) we have that that y'^M*y > 0 (as the expression is a quadratic in 5). 
Therefore we need to control the spectral norm of H* . We can show the above via very simple and 
fairly loose calculations 


||iL*|| < ||D*||+2||A|| + ||Z*|| 

© max Z) [i, i] © 26fnax T Oi^dmax) 

^ O^S-max') 

where 5max is the degree of the vertex with maximum degree in the graph G. The above equation 
follows with very loose aproximations from the definitions. A simple chernoff bound shows that 
with high probability 6max < + kqm + y/pm + kqm log(n) < 0 (fclog(n) + log^/^(n)) where we 

have replaced p with a and q with (3 which implies that ||iZ*|| < \/n which completes 
the proof since we have shown that M* is PSD. 

A.8 Monotone Adversary - Proof of Theorem 4.1 

Proof. We consider the SDP relaxation (4) as in the proof of Theorem 1.1. Let Y*{G) be the 
optimal solution of the SDP when we run it on the graph G. Now suppose G ~ Gp,q,k- The proof 
of Theorem 1.1 shows that with high probability, Y*{G) is unique and it corresponds to the hidden 
partition. Suppose this event happens, we then show that for any graph Gadv generated by the 
monotone adversary after acting on G, Y*{Gadv) is also unique and it is equal to Y*{G). This will 
prove Theorem 4.1. 

Define SDPq{Y) to be the objective value (corresponding to the graph G) of a feasible matrix 
T, i.e. SDPciY) = A{G) • Y . Note that since Y has only positive entries (since it is a feasible 
solution) we have that A{G') •Y < A{G) • T, if G' is a subgraph of G. Also since T © 0 and its 
diagonal entries Yu = 1 we have that \Yij\ < 1. Therefore A{G U e) • T < A{G) •Y + 2. Suppose 
the monotone adversary adds a total of r"*" edges and removes r~ edges. From the monotonicity 

^Indeed by definition any vector y € R„|fe © 1 can be written as x + 5^ for some ti and x € R„|fc. For the purpose 

of proving positive definiteness we can always divide by any positive number and can there fore consider ©L. Also 

II ^ II 

note that we can consider y or —y equivalently and hence can consider the case when o > 0. 
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of the adversary it is easy to see that A{Gadv) • Y*{G) = A{G) • Y*{G) + 2 r+. However for any 
other solution by the argument above we have that A{Gadv) *Y < A{G) •Y + 2r~^ . Also by our 
assumption we have that A(G) •y*(G) < A{G)»Y for any feasible Y ^ Y*{G). Putting it together 
we have that 


A{Gadv) • Y*{G) = A{G) . Y*{G) + 2 r+ > A{G) •Y + 2r+> A{Gadv) • , 

for any feasible Y 7 ^ Y*{G), which proves the theorem. 


□ 


A.9 Forms of Chernoff Bounds and Hoeffding Bounds Used in the Argnments 

Theorem A.13 (Chernoff). Suppose Xi ... A„ be independent random variables taking values in 
{0,1}- X denote their sum and let /r = E[A] be its expectation. Then for any 6 > 0 it holds 
that 


P(A > {l + 5)pi) < 


P(A < (1 -(5)/r) < 


\ I 


(1 + 5)(i+'5) J 


(25) 

(26) 


(1_5)(i-<5) 

A simplified form of the above bound is the following formula (for 5 <1) 

P(A > (l + <5)/r) < e"^ , 

P(A< {l-5)n) <e"^ . 

Theorem A.14 (Bernstein). Suppose Xi ... X^ be independent random variables taking values in 
[—M,M]. Let X denote their sum and let pL = E[A] he its expectation, then 

a 


P(|A-/r| >t) <exp -- 


1 




2 EiE[(Ai-E[A,]) 2 ] + Mt/3^ 

Corollary A.15. Suppose Xi...Xn are i.i.d Bernoulli variables with parameter p. Let a = 
a{Xi) = p{l — p) then we have that for any r > 0 

( j - \ alog(r) 

X > pL + aery n log(r) + a log(r) 1 < e 4 . 

Proof. We have that na‘^ = np{l —p) and M = 1. We can now choose t = aa^,/n log(r) +alog(r). 
This implies that + l/3a) < ^ iog(r) which implies from Theorem A. 14 that 

_ / _ / -;—^ ^ Q;log(r) _ 

P ( A > ^ + aay'n log(r) + a log(r) 1 < e 4 . □ 


Theorem A. 16 (Hoeffding). Let Xi ... X^ be independent random variables. Assume that the Xi 
are bounded in the interval [ai,bi]. Define the empirical mean of these variables as 


X = 




n 


then 


P(|X-Em|>0<2exp(-gj^^^) 


(27) 
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